Pesquisa | Portal Regional da BVS

1.

Using generative AI to investigate medical imagery models and datasets.

Lang, Oran; Yaya-Stupp, Doron; Traynis, Ilana; Cole-Lewis, Heather; Bennett, Chloe R; Lyles, Courtney R; Lau, Charles; Irani, Michal; Semturs, Christopher; Webster, Dale R; Corrado, Greg S; Hassidim, Avinatan; Matias, Yossi; Liu, Yun; Hammel, Naama; Babenko, Boris.

EBioMedicine ; 102: 105075, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38565004

RESUMO

BACKGROUND: AI models have shown promise in performing many medical imaging tasks. However, our ability to explain what signals these models have learned is severely lacking. Explanations are needed in order to increase the trust of doctors in AI-based models, especially in domains where AI prediction capabilities surpass those of humans. Moreover, such explanations could enable novel scientific discovery by uncovering signals in the data that aren't yet known to experts. METHODS: In this paper, we present a workflow for generating hypotheses to understand which visual signals in images are correlated with a classification model's predictions for a given task. This approach leverages an automatic visual explanation algorithm followed by interdisciplinary expert review. We propose the following 4 steps: (i) Train a classifier to perform a given task to assess whether the imagery indeed contains signals relevant to the task; (ii) Train a StyleGAN-based image generator with an architecture that enables guidance by the classifier ("StylEx"); (iii) Automatically detect, extract, and visualize the top visual attributes that the classifier is sensitive towards. For visualization, we independently modify each of these attributes to generate counterfactual visualizations for a set of images (i.e., what the image would look like with the attribute increased or decreased); (iv) Formulate hypotheses for the underlying mechanisms, to stimulate future research. Specifically, present the discovered attributes and corresponding counterfactual visualizations to an interdisciplinary panel of experts so that hypotheses can account for social and structural determinants of health (e.g., whether the attributes correspond to known patho-physiological or socio-cultural phenomena, or could be novel discoveries). FINDINGS: To demonstrate the broad applicability of our approach, we present results on eight prediction tasks across three medical imaging modalities-retinal fundus photographs, external eye photographs, and chest radiographs. We showcase examples where many of the automatically-learned attributes clearly capture clinically known features (e.g., types of cataract, enlarged heart), and demonstrate automatically-learned confounders that arise from factors beyond physiological mechanisms (e.g., chest X-ray underexposure is correlated with the classifier predicting abnormality, and eye makeup is correlated with the classifier predicting low hemoglobin levels). We further show that our method reveals a number of physiologically plausible, previously-unknown attributes based on the literature (e.g., differences in the fundus associated with self-reported sex, which were previously unknown). INTERPRETATION: Our approach enables hypotheses generation via attribute visualizations and has the potential to enable researchers to better understand, improve their assessment, and extract new knowledge from AI-based models, as well as debug and design better datasets. Though not designed to infer causality, importantly, we highlight that attributes generated by our framework can capture phenomena beyond physiology or pathophysiology, reflecting the real world nature of healthcare delivery and socio-cultural factors, and hence interdisciplinary perspectives are critical in these investigations. Finally, we will release code to help researchers train their own StylEx models and analyze their predictive tasks of interest, and use the methodology presented in this paper for responsible interpretation of the revealed attributes. FUNDING: Google.

Assuntos

Algoritmos , Catarata , Humanos , Cardiomegalia , Fundo de Olho , Inteligência Artificial

2.

An intentional approach to managing bias in general purpose embedding models.

Weng, Wei-Hung; Sellergen, Andrew; Kiraly, Atilla P; D'Amour, Alexander; Park, Jungyeon; Pilgrim, Rory; Pfohl, Stephen; Lau, Charles; Natarajan, Vivek; Azizi, Shekoofeh; Karthikesalingam, Alan; Cole-Lewis, Heather; Matias, Yossi; Corrado, Greg S; Webster, Dale R; Shetty, Shravya; Prabhakara, Shruthi; Eswaran, Krish; Celi, Leo A G; Liu, Yun.

Lancet Digit Health ; 6(2): e126-e130, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-38278614

RESUMO

Advances in machine learning for health care have brought concerns about bias from the research community; specifically, the introduction, perpetuation, or exacerbation of care disparities. Reinforcing these concerns is the finding that medical images often reveal signals about sensitive attributes in ways that are hard to pinpoint by both algorithms and people. This finding raises a question about how to best design general purpose pretrained embeddings (GPPEs, defined as embeddings meant to support a broad array of use cases) for building downstream models that are free from particular types of bias. The downstream model should be carefully evaluated for bias, and audited and improved as appropriate. However, in our view, well intentioned attempts to prevent the upstream components-GPPEs-from learning sensitive attributes can have unintended consequences on the downstream models. Despite producing a veneer of technical neutrality, the resultant end-to-end system might still be biased or poorly performing. We present reasons, by building on previously published data, to support the reasoning that GPPEs should ideally contain as much information as the original data contain, and highlight the perils of trying to remove sensitive attributes from a GPPE. We also emphasise that downstream prediction models trained for specific tasks and settings, whether developed using GPPEs or not, should be carefully designed and evaluated to avoid bias that makes models vulnerable to issues such as distributional shift. These evaluations should be done by a diverse team, including social scientists, on a diverse cohort representing the full breadth of the patient population for which the final model is intended.

Assuntos

Atenção à Saúde , Aprendizado de Máquina , Humanos , Viés , Algoritmos

3.

Risk Stratification for Diabetic Retinopathy Screening Order Using Deep Learning: A Multicenter Prospective Study.

Bora, Ashish; Tiwari, Richa; Bavishi, Pinal; Virmani, Sunny; Huang, Rayman; Traynis, Ilana; Corrado, Greg S; Peng, Lily; Webster, Dale R; Varadarajan, Avinash V; Pattanapongpaiboon, Warisara; Chopra, Reena; Ruamviboonsuk, Paisan.

Transl Vis Sci Technol ; 12(12): 11, 2023 12 01.

Artigo em Inglês | MEDLINE | ID: mdl-38079169

RESUMO

Purpose: Real-world evaluation of a deep learning model that prioritizes patients based on risk of progression to moderate or worse (MOD+) diabetic retinopathy (DR). Methods: This nonrandomized, single-arm, prospective, interventional study included patients attending DR screening at four centers across Thailand from September 2019 to January 2020, with mild or no DR. Fundus photographs were input into the model, and patients were scheduled for their subsequent screening from September 2020 to January 2021 in order of predicted risk. Evaluation focused on model sensitivity, defined as correctly ranking patients that developed MOD+ within the first 50% of subsequent screens. Results: We analyzed 1,757 patients, of which 52 (3.0%) developed MOD+. Using the model-proposed order, the model's sensitivity was 90.4%. Both the model-proposed order and mild/no DR plus HbA1c had significantly higher sensitivity than the random order (P < 0.001). Excluding one major (rural) site that had practical implementation challenges, the remaining sites included 567 patients and 15 (2.6%) developed MOD+. Here, the model-proposed order achieved 86.7% versus 73.3% for the ranking that used DR grade and hemoglobin A1c. Conclusions: The model can help prioritize follow-up visits for the largest subgroups of DR patients (those with no or mild DR). Further research is needed to evaluate the impact on clinical management and outcomes. Translational Relevance: Deep learning demonstrated potential for risk stratification in DR screening. However, real-world practicalities must be resolved to fully realize the benefit.

Assuntos

Aprendizado Profundo , Diabetes Mellitus , Retinopatia Diabética , Humanos , Retinopatia Diabética/diagnóstico , Retinopatia Diabética/epidemiologia , Estudos Prospectivos , Hemoglobinas Glicadas , Medição de Risco

4.

Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging.

Azizi, Shekoofeh; Culp, Laura; Freyberg, Jan; Mustafa, Basil; Baur, Sebastien; Kornblith, Simon; Chen, Ting; Tomasev, Nenad; Mitrovic, Jovana; Strachan, Patricia; Mahdavi, S Sara; Wulczyn, Ellery; Babenko, Boris; Walker, Megan; Loh, Aaron; Chen, Po-Hsuan Cameron; Liu, Yuan; Bavishi, Pinal; McKinney, Scott Mayer; Winkens, Jim; Roy, Abhijit Guha; Beaver, Zach; Ryan, Fiona; Krogue, Justin; Etemadi, Mozziyar; Telang, Umesh; Liu, Yun; Peng, Lily; Corrado, Greg S; Webster, Dale R; Fleet, David; Hinton, Geoffrey; Houlsby, Neil; Karthikesalingam, Alan; Norouzi, Mohammad; Natarajan, Vivek.

Nat Biomed Eng ; 7(6): 756-779, 2023 06.

Artigo em Inglês | MEDLINE | ID: mdl-37291435

RESUMO

Machine-learning models for medical tasks can match or surpass the performance of clinical experts. However, in settings differing from those of the training dataset, the performance of a model can deteriorate substantially. Here we report a representation-learning strategy for machine-learning models applied to medical-imaging tasks that mitigates such 'out of distribution' performance problem and that improves model robustness and training efficiency. The strategy, which we named REMEDIS (for 'Robust and Efficient Medical Imaging with Self-supervision'), combines large-scale supervised transfer learning on natural images and intermediate contrastive self-supervised learning on medical images and requires minimal task-specific customization. We show the utility of REMEDIS in a range of diagnostic-imaging tasks covering six imaging domains and 15 test datasets, and by simulating three realistic out-of-distribution scenarios. REMEDIS improved in-distribution diagnostic accuracies up to 11.5% with respect to strong supervised baseline models, and in out-of-distribution settings required only 1-33% of the data for retraining to match the performance of supervised models retrained using all available data. REMEDIS may accelerate the development lifecycle of machine-learning models for medical imaging.

Assuntos

Aprendizado de Máquina , Aprendizado de Máquina Supervisionado , Diagnóstico por Imagem

5.

Lessons learned from translating AI from development to deployment in healthcare.

Widner, Kasumi; Virmani, Sunny; Krause, Jonathan; Nayar, Jay; Tiwari, Richa; Pedersen, Elin Rønby; Jeji, Divleen; Hammel, Naama; Matias, Yossi; Corrado, Greg S; Liu, Yun; Peng, Lily; Webster, Dale R.

Nat Med ; 29(6): 1304-1306, 2023 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-37248297

Assuntos

Inteligência Artificial , Atenção à Saúde

6.

Pathologist Validation of a Machine Learning-Derived Feature for Colon Cancer Risk Stratification.

L'Imperio, Vincenzo; Wulczyn, Ellery; Plass, Markus; Müller, Heimo; Tamini, Nicolò; Gianotti, Luca; Zucchini, Nicola; Reihs, Robert; Corrado, Greg S; Webster, Dale R; Peng, Lily H; Chen, Po-Hsuan Cameron; Lavitrano, Marialuisa; Liu, Yun; Steiner, David F; Zatloukal, Kurt; Pagni, Fabio.

JAMA Netw Open ; 6(3): e2254891, 2023 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-36917112

RESUMO

Importance: Identifying new prognostic features in colon cancer has the potential to refine histopathologic review and inform patient care. Although prognostic artificial intelligence systems have recently demonstrated significant risk stratification for several cancer types, studies have not yet shown that the machine learning-derived features associated with these prognostic artificial intelligence systems are both interpretable and usable by pathologists. Objective: To evaluate whether pathologist scoring of a histopathologic feature previously identified by machine learning is associated with survival among patients with colon cancer. Design, Setting, and Participants: This prognostic study used deidentified, archived colorectal cancer cases from January 2013 to December 2015 from the University of Milano-Bicocca. All available histologic slides from 258 consecutive colon adenocarcinoma cases were reviewed from December 2021 to February 2022 by 2 pathologists, who conducted semiquantitative scoring for tumor adipose feature (TAF), which was previously identified via a prognostic deep learning model developed with an independent colorectal cancer cohort. Main Outcomes and Measures: Prognostic value of TAF for overall survival and disease-specific survival as measured by univariable and multivariable regression analyses. Interpathologist agreement in TAF scoring was also evaluated. Results: A total of 258 colon adenocarcinoma histopathologic cases from 258 patients (138 men [53%]; median age, 67 years [IQR, 65-81 years]) with stage II (n = 119) or stage III (n = 139) cancer were included. Tumor adipose feature was identified in 120 cases (widespread in 63 cases, multifocal in 31, and unifocal in 26). For overall survival analysis after adjustment for tumor stage, TAF was independently prognostic in 2 ways: TAF as a binary feature (presence vs absence: hazard ratio [HR] for presence of TAF, 1.55 [95% CI, 1.07-2.25]; P = .02) and TAF as a semiquantitative categorical feature (HR for widespread TAF, 1.87 [95% CI, 1.23-2.85]; P = .004). Interpathologist agreement for widespread TAF vs lower categories (absent, unifocal, or multifocal) was 90%, corresponding to a κ metric at this threshold of 0.69 (95% CI, 0.58-0.80). Conclusions and Relevance: In this prognostic study, pathologists were able to learn and reproducibly score for TAF, providing significant risk stratification on this independent data set. Although additional work is warranted to understand the biological significance of this feature and to establish broadly reproducible TAF scoring, this work represents the first validation to date of human expert learning from machine learning in pathology. Specifically, this validation demonstrates that a computationally identified histologic feature can represent a human-identifiable, prognostic feature with the potential for integration into pathology practice.

Assuntos

Adenocarcinoma , Neoplasias do Colo , Masculino , Humanos , Idoso , Neoplasias do Colo/diagnóstico , Patologistas , Inteligência Artificial , Aprendizado de Máquina , Medição de Risco

7.

A deep learning model for novel systemic biomarkers in photographs of the external eye: a retrospective study.

Babenko, Boris; Traynis, Ilana; Chen, Christina; Singh, Preeti; Uddin, Akib; Cuadros, Jorge; Daskivich, Lauren P; Maa, April Y; Kim, Ramasamy; Kang, Eugene Yu-Chuan; Matias, Yossi; Corrado, Greg S; Peng, Lily; Webster, Dale R; Semturs, Christopher; Krause, Jonathan; Varadarajan, Avinash V; Hammel, Naama; Liu, Yun.

Lancet Digit Health ; 5(5): e257-e264, 2023 05.

Artigo em Inglês | MEDLINE | ID: mdl-36966118

RESUMO

BACKGROUND: Photographs of the external eye were recently shown to reveal signs of diabetic retinal disease and elevated glycated haemoglobin. This study aimed to test the hypothesis that external eye photographs contain information about additional systemic medical conditions. METHODS: We developed a deep learning system (DLS) that takes external eye photographs as input and predicts systemic parameters, such as those related to the liver (albumin, aspartate aminotransferase [AST]); kidney (estimated glomerular filtration rate [eGFR], urine albumin-to-creatinine ratio [ACR]); bone or mineral (calcium); thyroid (thyroid stimulating hormone); and blood (haemoglobin, white blood cells [WBC], platelets). This DLS was trained using 123 130 images from 38 398 patients with diabetes undergoing diabetic eye screening in 11 sites across Los Angeles county, CA, USA. Evaluation focused on nine prespecified systemic parameters and leveraged three validation sets (A, B, C) spanning 25 510 patients with and without diabetes undergoing eye screening in three independent sites in Los Angeles county, CA, and the greater Atlanta area, GA, USA. We compared performance against baseline models incorporating available clinicodemographic variables (eg, age, sex, race and ethnicity, years with diabetes). FINDINGS: Relative to the baseline, the DLS achieved statistically significant superior performance at detecting AST >36·0 U/L, calcium <8·6 mg/dL, eGFR <60·0 mL/min/1·73 m2, haemoglobin <11·0 g/dL, platelets <150·0 × 103/µL, ACR ≥300 mg/g, and WBC <4·0 × 103/µL on validation set A (a population resembling the development datasets), with the area under the receiver operating characteristic curve (AUC) of the DLS exceeding that of the baseline by 5·3-19·9% (absolute differences in AUC). On validation sets B and C, with substantial patient population differences compared with the development datasets, the DLS outperformed the baseline for ACR ≥300·0 mg/g and haemoglobin <11·0 g/dL by 7·3-13·2%. INTERPRETATION: We found further evidence that external eye photographs contain biomarkers spanning multiple organ systems. Such biomarkers could enable accessible and non-invasive screening of disease. Further work is needed to understand the translational implications. FUNDING: Google.

Assuntos

Aprendizado Profundo , Retinopatia Diabética , Humanos , Estudos Retrospectivos , Cálcio , Retinopatia Diabética/diagnóstico , Biomarcadores , Albuminas

8.

Detection of signs of disease in external photographs of the eyes via deep learning.

Babenko, Boris; Mitani, Akinori; Traynis, Ilana; Kitade, Naho; Singh, Preeti; Maa, April Y; Cuadros, Jorge; Corrado, Greg S; Peng, Lily; Webster, Dale R; Varadarajan, Avinash; Hammel, Naama; Liu, Yun.

Nat Biomed Eng ; 6(12): 1370-1383, 2022 12.

Artigo em Inglês | MEDLINE | ID: mdl-35352000

RESUMO

Retinal fundus photographs can be used to detect a range of retinal conditions. Here we show that deep-learning models trained instead on external photographs of the eyes can be used to detect diabetic retinopathy (DR), diabetic macular oedema and poor blood glucose control. We developed the models using eye photographs from 145,832 patients with diabetes from 301 DR screening sites and evaluated the models on four tasks and four validation datasets with a total of 48,644 patients from 198 additional screening sites. For all four tasks, the predictive performance of the deep-learning models was significantly higher than the performance of logistic regression models using self-reported demographic and medical history data, and the predictions generalized to patients with dilated pupils, to patients from a different DR screening programme and to a general eye care programme that included diabetics and non-diabetics. We also explored the use of the deep-learning models for the detection of elevated lipid levels. The utility of external eye photographs for the diagnosis and management of diseases should be further validated with images from different cameras and patient populations.

Assuntos

Aprendizado Profundo , Retinopatia Diabética , Doenças Retinianas , Humanos , Sensibilidade e Especificidade , Retinopatia Diabética/diagnóstico por imagem , Fundo de Olho

9.

Real-time diabetic retinopathy screening by deep learning in a multisite national screening programme: a prospective interventional cohort study.

Ruamviboonsuk, Paisan; Tiwari, Richa; Sayres, Rory; Nganthavee, Variya; Hemarat, Kornwipa; Kongprayoon, Apinpat; Raman, Rajiv; Levinstein, Brian; Liu, Yun; Schaekermann, Mike; Lee, Roy; Virmani, Sunny; Widner, Kasumi; Chambers, John; Hersch, Fred; Peng, Lily; Webster, Dale R.

Lancet Digit Health ; 4(4): e235-e244, 2022 04.

Artigo em Inglês | MEDLINE | ID: mdl-35272972

RESUMO

BACKGROUND: Diabetic retinopathy is a leading cause of preventable blindness, especially in low-income and middle-income countries (LMICs). Deep-learning systems have the potential to enhance diabetic retinopathy screenings in these settings, yet prospective studies assessing their usability and performance are scarce. METHODS: We did a prospective interventional cohort study to evaluate the real-world performance and feasibility of deploying a deep-learning system into the health-care system of Thailand. Patients with diabetes and listed on the national diabetes registry, aged 18 years or older, able to have their fundus photograph taken for at least one eye, and due for screening as per the Thai Ministry of Public Health guidelines were eligible for inclusion. Eligible patients were screened with the deep-learning system at nine primary care sites under Thailand's national diabetic retinopathy screening programme. Patients with a previous diagnosis of diabetic macular oedema, severe non-proliferative diabetic retinopathy, or proliferative diabetic retinopathy; previous laser treatment of the retina or retinal surgery; other non-diabetic retinopathy eye disease requiring referral to an ophthalmologist; or inability to have fundus photograph taken of both eyes for any reason were excluded. Deep-learning system-based interpretations of patient fundus images and referral recommendations were provided in real time. As a safety mechanism, regional retina specialists over-read each image. Performance of the deep-learning system (accuracy, sensitivity, specificity, positive predictive value [PPV], and negative predictive value [NPV]) were measured against an adjudicated reference standard, provided by fellowship-trained retina specialists. This study is registered with the Thai national clinical trials registry, TCRT20190902002. FINDINGS: Between Dec 12, 2018, and March 29, 2020, 7940 patients were screened for inclusion. 7651 (96·3%) patients were eligible for study analysis, and 2412 (31·5%) patients were referred for diabetic retinopathy, diabetic macular oedema, ungradable images, or low visual acuity. For vision-threatening diabetic retinopathy, the deep-learning system had an accuracy of 94·7% (95% CI 93·0-96·2), sensitivity of 91·4% (87·1-95·0), and specificity of 95·4% (94·1-96·7). The retina specialist over-readers had an accuracy of 93·5 (91·7-95·0; p=0·17), a sensitivity of 84·8% (79·4-90·0; p=0·024), and specificity of 95·5% (94·1-96·7; p=0·98). The PPV for the deep-learning system was 79·2 (95% CI 73·8-84·3) compared with 75·6 (69·8-81·1) for the over-readers. The NPV for the deep-learning system was 95·5 (92·8-97·9) compared with 92·4 (89·3-95·5) for the over-readers. INTERPRETATION: A deep-learning system can deliver real-time diabetic retinopathy detection capability similar to retina specialists in community-based screening settings. Socioenvironmental factors and workflows must be taken into consideration when implementing a deep-learning system within a large-scale screening programme in LMICs. FUNDING: Google and Rajavithi Hospital, Bangkok, Thailand. TRANSLATION: For the Thai translation of the abstract see Supplementary Materials section.

Assuntos

Aprendizado Profundo , Diabetes Mellitus , Retinopatia Diabética , Edema Macular , Estudos de Coortes , Retinopatia Diabética/diagnóstico , Humanos , Edema Macular/diagnóstico , Estudos Prospectivos , Tailândia

10.

Deep Learning to Detect OCT-derived Diabetic Macular Edema from Color Retinal Photographs: A Multicenter Validation Study.

Liu, Xinle; Ali, Tayyeba K; Singh, Preeti; Shah, Ami; McKinney, Scott Mayer; Ruamviboonsuk, Paisan; Turner, Angus W; Keane, Pearse A; Chotcomwongse, Peranut; Nganthavee, Variya; Chia, Mark; Huemer, Josef; Cuadros, Jorge; Raman, Rajiv; Corrado, Greg S; Peng, Lily; Webster, Dale R; Hammel, Naama; Varadarajan, Avinash V; Liu, Yun; Chopra, Reena; Bavishi, Pinal.

Ophthalmol Retina ; 6(5): 398-410, 2022 05.

Artigo em Inglês | MEDLINE | ID: mdl-34999015

RESUMO

PURPOSE: To validate the generalizability of a deep learning system (DLS) that detects diabetic macular edema (DME) from 2-dimensional color fundus photographs (CFP), for which the reference standard for retinal thickness and fluid presence is derived from 3-dimensional OCT. DESIGN: Retrospective validation of a DLS across international datasets. PARTICIPANTS: Paired CFP and OCT of patients from diabetic retinopathy (DR) screening programs or retina clinics. The DLS was developed using data sets from Thailand, the United Kingdom, and the United States and validated using 3060 unique eyes from 1582 patients across screening populations in Australia, India, and Thailand. The DLS was separately validated in 698 eyes from 537 screened patients in the United Kingdom with mild DR and suspicion of DME based on CFP. METHODS: The DLS was trained using DME labels from OCT. The presence of DME was based on retinal thickening or intraretinal fluid. The DLS's performance was compared with expert grades of maculopathy and to a previous proof-of-concept version of the DLS. We further simulated the integration of the current DLS into an algorithm trained to detect DR from CFP. MAIN OUTCOME MEASURES: The superiority of specificity and noninferiority of sensitivity of the DLS for the detection of center-involving DME, using device-specific thresholds, compared with experts. RESULTS: The primary analysis in a combined data set spanning Australia, India, and Thailand showed the DLS had 80% specificity and 81% sensitivity, compared with expert graders, who had 59% specificity and 70% sensitivity. Relative to human experts, the DLS had significantly higher specificity (P = 0.008) and noninferior sensitivity (P < 0.001). In the data set from the United Kingdom, the DLS had a specificity of 80% (P < 0.001 for specificity of >50%) and a sensitivity of 100% (P = 0.02 for sensitivity of > 90%). CONCLUSIONS: The DLS can generalize to multiple international populations with an accuracy exceeding that of experts. The clinical value of this DLS to reduce false-positive referrals, thus decreasing the burden on specialist eye care, warrants a prospective evaluation.

Assuntos

Aprendizado Profundo , Diabetes Mellitus , Retinopatia Diabética , Edema Macular , Retinopatia Diabética/complicações , Retinopatia Diabética/diagnóstico , Humanos , Edema Macular/diagnóstico , Edema Macular/etiologia , Estudos Retrospectivos , Tomografia de Coerência Óptica/métodos , Estados Unidos

11.

Development and Assessment of an Artificial Intelligence-Based Tool for Skin Condition Diagnosis by Primary Care Physicians and Nurse Practitioners in Teledermatology Practices.

Jain, Ayush; Way, David; Gupta, Vishakha; Gao, Yi; de Oliveira Marinho, Guilherme; Hartford, Jay; Sayres, Rory; Kanada, Kimberly; Eng, Clara; Nagpal, Kunal; DeSalvo, Karen B; Corrado, Greg S; Peng, Lily; Webster, Dale R; Dunn, R Carter; Coz, David; Huang, Susan J; Liu, Yun; Bui, Peggy; Liu, Yuan.

JAMA Netw Open ; 4(4): e217249, 2021 04 01.

Artigo em Inglês | MEDLINE | ID: mdl-33909055

RESUMO

Importance: Most dermatologic cases are initially evaluated by nondermatologists such as primary care physicians (PCPs) or nurse practitioners (NPs). Objective: To evaluate an artificial intelligence (AI)-based tool that assists with diagnoses of dermatologic conditions. Design, Setting, and Participants: This multiple-reader, multiple-case diagnostic study developed an AI-based tool and evaluated its utility. Primary care physicians and NPs retrospectively reviewed an enriched set of cases representing 120 different skin conditions. Randomization was used to ensure each clinician reviewed each case either with or without AI assistance; each clinician alternated between batches of 50 cases in each modality. The reviews occurred from February 21 to April 28, 2020. Data were analyzed from May 26, 2020, to January 27, 2021. Exposures: An AI-based assistive tool for interpreting clinical images and associated medical history. Main Outcomes and Measures: The primary analysis evaluated agreement with reference diagnoses provided by a panel of 3 dermatologists for PCPs and NPs. Secondary analyses included diagnostic accuracy for biopsy-confirmed cases, biopsy and referral rates, review time, and diagnostic confidence. Results: Forty board-certified clinicians, including 20 PCPs (14 women [70.0%]; mean experience, 11.3 [range, 2-32] years) and 20 NPs (18 women [90.0%]; mean experience, 13.1 [range, 2-34] years) reviewed 1048 retrospective cases (672 female [64.2%]; median age, 43 [interquartile range, 30-56] years; 41â¯920 total reviews) from a teledermatology practice serving 11 sites and provided 0 to 5 differential diagnoses per case (mean [SD], 1.6 [0.7]). The PCPs were located across 12 states, and the NPs practiced in primary care without physician supervision across 9 states. The NPs had a mean of 13.1 (range, 2-34) years of experience and practiced in primary care without physician supervision across 9 states. Artificial intelligence assistance was significantly associated with higher agreement with reference diagnoses. For PCPs, the increase in diagnostic agreement was 10% (95% CI, 8%-11%; P < .001), from 48% to 58%; for NPs, the increase was 12% (95% CI, 10%-14%; P < .001), from 46% to 58%. In secondary analyses, agreement with biopsy-obtained diagnosis categories of maglignant, precancerous, or benign increased by 3% (95% CI, -1% to 7%) for PCPs and by 8% (95% CI, 3%-13%) for NPs. Rates of desire for biopsies decreased by 1% (95% CI, 0-3%) for PCPs and 2% (95% CI, 1%-3%) for NPs; the rate of desire for referrals decreased by 3% (95% CI, 1%-4%) for PCPs and NPs. Diagnostic agreement on cases not indicated for a dermatologist referral increased by 10% (95% CI, 8%-12%) for PCPs and 12% (95% CI, 10%-14%) for NPs, and median review time increased slightly by 5 (95% CI, 0-8) seconds for PCPs and 7 (95% CI, 5-10) seconds for NPs per case. Conclusions and Relevance: Artificial intelligence assistance was associated with improved diagnoses by PCPs and NPs for 1 in every 8 to 10 cases, indicating potential for improving the quality of dermatologic care.

Assuntos

Inteligência Artificial , Diagnóstico por Computador , Profissionais de Enfermagem , Médicos de Atenção Primária , Dermatopatias/diagnóstico , Adulto , Dermatologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Encaminhamento e Consulta , Telemedicina

12.

Predicting the risk of developing diabetic retinopathy using deep learning.

Bora, Ashish; Balasubramanian, Siva; Babenko, Boris; Virmani, Sunny; Venugopalan, Subhashini; Mitani, Akinori; de Oliveira Marinho, Guilherme; Cuadros, Jorge; Ruamviboonsuk, Paisan; Corrado, Greg S; Peng, Lily; Webster, Dale R; Varadarajan, Avinash V; Hammel, Naama; Liu, Yun; Bavishi, Pinal.

Lancet Digit Health ; 3(1): e10-e19, 2021 01.

Artigo em Inglês | MEDLINE | ID: mdl-33735063

RESUMO

BACKGROUND: Diabetic retinopathy screening is instrumental to preventing blindness, but scaling up screening is challenging because of the increasing number of patients with all forms of diabetes. We aimed to create a deep-learning system to predict the risk of patients with diabetes developing diabetic retinopathy within 2 years. METHODS: We created and validated two versions of a deep-learning system to predict the development of diabetic retinopathy in patients with diabetes who had had teleretinal diabetic retinopathy screening in a primary care setting. The input for the two versions was either a set of three-field or one-field colour fundus photographs. Of the 575â431 eyes in the development set 28â899 had known outcomes, with the remaining 546â532 eyes used to augment the training process via multitask learning. Validation was done on one eye (selected at random) per patient from two datasets: an internal validation (from EyePACS, a teleretinal screening service in the USA) set of 3678 eyes with known outcomes and an external validation (from Thailand) set of 2345 eyes with known outcomes. FINDINGS: The three-field deep-learning system had an area under the receiver operating characteristic curve (AUC) of 0·79 (95% CI 0·77-0·81) in the internal validation set. Assessment of the external validation set-which contained only one-field colour fundus photographs-with the one-field deep-learning system gave an AUC of 0·70 (0·67-0·74). In the internal validation set, the AUC of available risk factors was 0·72 (0·68-0·76), which improved to 0·81 (0·77-0·84) after combining the deep-learning system with these risk factors (p<0·0001). In the external validation set, the corresponding AUC improved from 0·62 (0·58-0·66) to 0·71 (0·68-0·75; p<0·0001) following the addition of the deep-learning system to available risk factors. INTERPRETATION: The deep-learning systems predicted diabetic retinopathy development using colour fundus photographs, and the systems were independent of and more informative than available risk factors. Such a risk stratification tool might help to optimise screening intervals to reduce costs while improving vision-related outcomes. FUNDING: Google.

Assuntos

Aprendizado Profundo , Retinopatia Diabética/diagnóstico , Idoso , Área Sob a Curva , Técnicas de Diagnóstico Oftalmológico , Feminino , Humanos , Estimativa de Kaplan-Meier , Masculino , Pessoa de Meia-Idade , Fotografação , Prognóstico , Curva ROC , Reprodutibilidade dos Testes , Medição de Risco/métodos

13.

Longitudinal Screening for Diabetic Retinopathy in a Nationwide Screening Program: Comparing Deep Learning and Human Graders.

Limwattanayingyong, Jirawut; Nganthavee, Variya; Seresirikachorn, Kasem; Singalavanija, Tassapol; Soonthornworasiri, Ngamphol; Ruamviboonsuk, Varis; Rao, Chetan; Raman, Rajiv; Grzybowski, Andrzej; Schaekermann, Mike; Peng, Lily H; Webster, Dale R; Semturs, Christopher; Krause, Jonathan; Sayres, Rory; Hersch, Fred; Tiwari, Richa; Liu, Yun; Ruamviboonsuk, Paisan.

J Diabetes Res ; 2020: 8839376, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33381600

RESUMO

OBJECTIVE: To evaluate diabetic retinopathy (DR) screening via deep learning (DL) and trained human graders (HG) in a longitudinal cohort, as case spectrum shifts based on treatment referral and new-onset DR. METHODS: We randomly selected patients with diabetes screened twice, two years apart within a nationwide screening program. The reference standard was established via adjudication by retina specialists. Each patient's color fundus photographs were graded, and a patient was considered as having sight-threatening DR (STDR) if the worse eye had severe nonproliferative DR, proliferative DR, or diabetic macular edema. We compared DR screening via two modalities: DL and HG. For each modality, we simulated treatment referral by excluding patients with detected STDR from the second screening using that modality. RESULTS: There were 5,738 patients (12.3% STDR) in the first screening. DL and HG captured different numbers of STDR cases, and after simulated referral and excluding ungradable cases, 4,148 and 4,263 patients remained in the second screening, respectively. The STDR prevalence at the second screening was 5.1% and 6.8% for DL- and HG-based screening, respectively. Along with the prevalence decrease, the sensitivity for both modalities decreased from the first to the second screening (DL: from 95% to 90%, p = 0.008; HG: from 74% to 57%, p < 0.001). At both the first and second screenings, the rate of false negatives for the DL was a fifth that of HG (0.5-0.6% vs. 2.9-3.2%). CONCLUSION: On 2-year longitudinal follow-up of a DR screening cohort, STDR prevalence decreased for both DL- and HG-based screening. Follow-up screenings in longitudinal DR screening can be more difficult and induce lower sensitivity for both DL and HG, though the false negative rate was substantially lower for DL. Our data may be useful for health-economics analyses of longitudinal screening settings.

Assuntos

Aprendizado Profundo , Retinopatia Diabética/diagnóstico por imagem , Fundo de Olho , Interpretação de Imagem Assistida por Computador , Edema Macular/diagnóstico por imagem , Programas de Rastreamento , Fotografação , Idoso , Proliferação de Células , Retinopatia Diabética/epidemiologia , Feminino , Humanos , Incidência , Estudos Longitudinais , Edema Macular/epidemiologia , Masculino , Pessoa de Meia-Idade , Programas Nacionais de Saúde , Valor Preditivo dos Testes , Prevalência , Reprodutibilidade dos Testes , Índice de Gravidade de Doença , Tailândia/epidemiologia

14.

A deep learning system for differential diagnosis of skin diseases.

Liu, Yuan; Jain, Ayush; Eng, Clara; Way, David H; Lee, Kang; Bui, Peggy; Kanada, Kimberly; de Oliveira Marinho, Guilherme; Gallegos, Jessica; Gabriele, Sara; Gupta, Vishakha; Singh, Nalini; Natarajan, Vivek; Hofmann-Wellenhof, Rainer; Corrado, Greg S; Peng, Lily H; Webster, Dale R; Ai, Dennis; Huang, Susan J; Liu, Yun; Dunn, R Carter; Coz, David.

Nat Med ; 26(6): 900-908, 2020 06.

Artigo em Inglês | MEDLINE | ID: mdl-32424212

RESUMO

Skin conditions affect 1.9 billion people. Because of a shortage of dermatologists, most cases are seen instead by general practitioners with lower diagnostic accuracy. We present a deep learning system (DLS) to provide a differential diagnosis of skin conditions using 16,114 de-identified cases (photographs and clinical data) from a teledermatology practice serving 17 sites. The DLS distinguishes between 26 common skin conditions, representing 80% of cases seen in primary care, while also providing a secondary prediction covering 419 skin conditions. On 963 validation cases, where a rotating panel of three board-certified dermatologists defined the reference standard, the DLS was non-inferior to six other dermatologists and superior to six primary care physicians (PCPs) and six nurse practitioners (NPs) (top-1 accuracy: 0.66 DLS, 0.63 dermatologists, 0.44 PCPs and 0.40 NPs). These results highlight the potential of the DLS to assist general practitioners in diagnosing skin conditions.

Assuntos

Aprendizado Profundo , Diagnóstico Diferencial , Dermatopatias/diagnóstico , Acne Vulgar/diagnóstico , Adulto , Negro ou Afro-Americano , Nativos do Alasca , Asiático , Carcinoma Basocelular/diagnóstico , Carcinoma de Células Escamosas/diagnóstico , Dermatite Seborreica/diagnóstico , Dermatologistas , Eczema/diagnóstico , Feminino , Foliculite/diagnóstico , Hispânico ou Latino , Humanos , Índios Norte-Americanos , Ceratose Seborreica/diagnóstico , Masculino , Melanoma/diagnóstico , Pessoa de Meia-Idade , Havaiano Nativo ou Outro Ilhéu do Pacífico , Profissionais de Enfermagem , Fotografação , Médicos de Atenção Primária , Psoríase/diagnóstico , Neoplasias Cutâneas/diagnóstico , Telemedicina , Verrugas/diagnóstico , População Branca

15.

Author Correction: Detection of anaemia from retinal fundus images via deep learning.

Mitani, Akinori; Huang, Abigail; Venugopalan, Subhashini; Corrado, Greg S; Peng, Lily; Webster, Dale R; Hammel, Naama; Liu, Yun; Varadarajan, Avinash V.

Nat Biomed Eng ; 4(2): 242, 2020 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-32051580

RESUMO

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

16.

Predicting optical coherence tomography-derived diabetic macular edema grades from fundus photographs using deep learning.

Varadarajan, Avinash V; Bavishi, Pinal; Ruamviboonsuk, Paisan; Chotcomwongse, Peranut; Venugopalan, Subhashini; Narayanaswamy, Arunachalam; Cuadros, Jorge; Kanai, Kuniyoshi; Bresnick, George; Tadarati, Mongkol; Silpa-Archa, Sukhum; Limwattanayingyong, Jirawut; Nganthavee, Variya; Ledsam, Joseph R; Keane, Pearse A; Corrado, Greg S; Peng, Lily; Webster, Dale R.

Nat Commun ; 11(1): 130, 2020 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-31913272

RESUMO

Center-involved diabetic macular edema (ci-DME) is a major cause of vision loss. Although the gold standard for diagnosis involves 3D imaging, 2D imaging by fundus photography is usually used in screening settings, resulting in high false-positive and false-negative calls. To address this, we train a deep learning model to predict ci-DME from fundus photographs, with an ROC-AUC of 0.89 (95% CI: 0.87-0.91), corresponding to 85% sensitivity at 80% specificity. In comparison, retinal specialists have similar sensitivities (82-85%), but only half the specificity (45-50%, p < 0.001). Our model can also detect the presence of intraretinal fluid (AUC: 0.81; 95% CI: 0.81-0.86) and subretinal fluid (AUC 0.88; 95% CI: 0.85-0.91). Using deep learning to make predictions via simple 2D images without sophisticated 3D-imaging equipment and with better than specialist performance, has broad relevance to many other applications in medical imaging.

Assuntos

Retinopatia Diabética/diagnóstico por imagem , Edema Macular/diagnóstico por imagem , Idoso , Aprendizado Profundo , Retinopatia Diabética/genética , Feminino , Humanos , Imageamento Tridimensional , Edema Macular/genética , Masculino , Pessoa de Meia-Idade , Mutação , Fotografação , Retina/diagnóstico por imagem , Tomografia de Coerência Óptica

17.

Detection of anaemia from retinal fundus images via deep learning.

Mitani, Akinori; Huang, Abigail; Venugopalan, Subhashini; Corrado, Greg S; Peng, Lily; Webster, Dale R; Hammel, Naama; Liu, Yun; Varadarajan, Avinash V.

Nat Biomed Eng ; 4(1): 18-27, 2020 01.

Artigo em Inglês | MEDLINE | ID: mdl-31873211

RESUMO

Owing to the invasiveness of diagnostic tests for anaemia and the costs associated with screening for it, the condition is often undetected. Here, we show that anaemia can be detected via machine-learning algorithms trained using retinal fundus images, study participant metadata (including race or ethnicity, age, sex and blood pressure) or the combination of both data types (images and study participant metadata). In a validation dataset of 11,388 study participants from the UK Biobank, the fundus-image-only, metadata-only and combined models predicted haemoglobin concentration (in g dl-1) with mean absolute error values of 0.73 (95% confidence interval: 0.72-0.74), 0.67 (0.66-0.68) and 0.63 (0.62-0.64), respectively, and with areas under the receiver operating characteristic curve (AUC) values of 0.74 (0.71-0.76), 0.87 (0.85-0.89) and 0.88 (0.86-0.89), respectively. For 539 study participants with self-reported diabetes, the combined model predicted haemoglobin concentration with a mean absolute error of 0.73 (0.68-0.78) and anaemia an AUC of 0.89 (0.85-0.93). Automated anaemia screening on the basis of fundus images could particularly aid patients with diabetes undergoing regular retinal imaging and for whom anaemia can increase morbidity and mortality risks.

Assuntos

Anemia/diagnóstico por imagem , Retina/diagnóstico por imagem , Aprendizado Profundo , Feminino , Fundo de Olho , Humanos , Masculino , Pessoa de Meia-Idade , Estudos Prospectivos , Curva ROC

18.

Remote Tool-Based Adjudication for Grading Diabetic Retinopathy.

Schaekermann, Mike; Hammel, Naama; Terry, Michael; Ali, Tayyeba K; Liu, Yun; Basham, Brian; Campana, Bilson; Chen, William; Ji, Xiang; Krause, Jonathan; Corrado, Greg S; Peng, Lily; Webster, Dale R; Law, Edith; Sayres, Rory.

Transl Vis Sci Technol ; 8(6): 40, 2019 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-31867141

RESUMO

PURPOSE: To present and evaluate a remote, tool-based system and structured grading rubric for adjudicating image-based diabetic retinopathy (DR) grades. METHODS: We compared three different procedures for adjudicating DR severity assessments among retina specialist panels, including (1) in-person adjudication based on a previously described procedure (Baseline), (2) remote, tool-based adjudication for assessing DR severity alone (TA), and (3) remote, tool-based adjudication using a feature-based rubric (TA-F). We developed a system allowing graders to review images remotely and asynchronously. For both TA and TA-F approaches, images with disagreement were reviewed by all graders in a round-robin fashion until disagreements were resolved. Five panels of three retina specialists each adjudicated a set of 499 retinal fundus images (1 panel using Baseline, 2 using TA, and 2 using TA-F adjudication). Reliability was measured as grade agreement among the panels using Cohen's quadratically weighted kappa. Efficiency was measured as the number of rounds needed to reach a consensus for tool-based adjudication. RESULTS: The grades from remote, tool-based adjudication showed high agreement with the Baseline procedure, with Cohen's kappa scores of 0.948 and 0.943 for the two TA panels, and 0.921 and 0.963 for the two TA-F panels. Cases adjudicated using TA-F were resolved in fewer rounds compared with TA (P < 0.001; standard permutation test). CONCLUSIONS: Remote, tool-based adjudication presents a flexible and reliable alternative to in-person adjudication for DR diagnosis. Feature-based rubrics can help accelerate consensus for tool-based adjudication of DR without compromising label quality. TRANSLATIONAL RELEVANCE: This approach can generate reference standards to validate automated methods, and resolve ambiguous diagnoses by integrating into existing telemedical workflows.

19.

Deep Learning and Glaucoma Specialists: The Relative Importance of Optic Disc Features to Predict Glaucoma Referral in Fundus Photographs.

Phene, Sonia; Dunn, R Carter; Hammel, Naama; Liu, Yun; Krause, Jonathan; Kitade, Naho; Schaekermann, Mike; Sayres, Rory; Wu, Derek J; Bora, Ashish; Semturs, Christopher; Misra, Anita; Huang, Abigail E; Spitze, Arielle; Medeiros, Felipe A; Maa, April Y; Gandhi, Monica; Corrado, Greg S; Peng, Lily; Webster, Dale R.

Ophthalmology ; 126(12): 1627-1639, 2019 12.

Artigo em Inglês | MEDLINE | ID: mdl-31561879

RESUMO

PURPOSE: To develop and validate a deep learning (DL) algorithm that predicts referable glaucomatous optic neuropathy (GON) and optic nerve head (ONH) features from color fundus images, to determine the relative importance of these features in referral decisions by glaucoma specialists (GSs) and the algorithm, and to compare the performance of the algorithm with eye care providers. DESIGN: Development and validation of an algorithm. PARTICIPANTS: Fundus images from screening programs, studies, and a glaucoma clinic. METHODS: A DL algorithm was trained using a retrospective dataset of 86 618 images, assessed for glaucomatous ONH features and referable GON (defined as ONH appearance worrisome enough to justify referral for comprehensive examination) by 43 graders. The algorithm was validated using 3 datasets: dataset A (1205 images, 1 image/patient; 18.1% referable), images adjudicated by panels of GSs; dataset B (9642 images, 1 image/patient; 9.2% referable), images from a diabetic teleretinal screening program; and dataset C (346 images, 1 image/patient; 81.7% referable), images from a glaucoma clinic. MAIN OUTCOME MEASURES: The algorithm was evaluated using the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity for referable GON and glaucomatous ONH features. RESULTS: The algorithm's AUC for referable GON was 0.945 (95% confidence interval [CI], 0.929-0.960) in dataset A, 0.855 (95% CI, 0.841-0.870) in dataset B, and 0.881 (95% CI, 0.838-0.918) in dataset C. Algorithm AUCs ranged between 0.661 and 0.973 for glaucomatous ONH features. The algorithm showed significantly higher sensitivity than 7 of 10 graders not involved in determining the reference standard, including 2 of 3 GSs, and showed higher specificity than 3 graders (including 1 GS), while remaining comparable to others. For both GSs and the algorithm, the most crucial features related to referable GON were: presence of vertical cup-to-disc ratio of 0.7 or more, neuroretinal rim notching, retinal nerve fiber layer defect, and bared circumlinear vessels. CONCLUSIONS: A DL algorithm trained on fundus images alone can detect referable GON with higher sensitivity than and comparable specificity to eye care providers. The algorithm maintained good performance on an independent dataset with diagnoses based on a full glaucoma workup.

Assuntos

Aprendizado Profundo , Glaucoma de Ângulo Aberto/diagnóstico , Oftalmologistas , Disco Óptico/patologia , Doenças do Nervo Óptico/diagnóstico , Especialização , Idoso , Área Sob a Curva , Conjuntos de Dados como Assunto , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Fibras Nervosas/patologia , Curva ROC , Encaminhamento e Consulta , Células Ganglionares da Retina/patologia , Estudos Retrospectivos , Sensibilidade e Especificidade

20.

Erratum: Author Correction: Deep learning versus human graders for classifying diabetic retinopathy severity in a nationwide screening program.

Ruamviboonsuk, Paisan; Krause, Jonathan; Chotcomwongse, Peranut; Sayres, Rory; Raman, Rajiv; Widner, Kasumi; Campana, Bilson J L; Phene, Sonia; Hemarat, Kornwipa; Tadarati, Mongkol; Silpa-Archa, Sukhum; Limwattanayingyong, Jirawut; Rao, Chetan; Kuruvilla, Oscar; Jung, Jesse; Tan, Jeffrey; Orprayoon, Surapong; Kangwanwongpaisan, Chawawat; Sukumalpaiboon, Ramase; Luengchaichawang, Chainarong; Fuangkaew, Jitumporn; Kongsap, Pipat; Chualinpha, Lamyong; Saree, Sarawuth; Kawinpanitan, Srirut; Mitvongsa, Korntip; Lawanasakol, Siriporn; Thepchatri, Chaiyasit; Wongpichedchai, Lalita; Corrado, Greg S; Peng, Lily; Webster, Dale R.

NPJ Digit Med ; 2: 68, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31341955

RESUMO

[This corrects the article DOI: 10.1038/s41746-019-0099-8.].

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA